Computational and Structural Biotechnology Journal — Latest Matching Preprints

1

Computational Design and Atomistic Validation of a High-Affinity VHH Nanobody Targeting the PI/RuvC Interface of Streptococcus pyogenes Cas9: A Bivalent Hub Strategy for CRISPR-Cas9 Enhancement

Kumar, N.; Dalal, D.; Sharma, V.

2026-03-25 bioinformatics 10.64898/2026.03.22.713495 medRxiv

Top 0.1%

27.2%

Show abstract

The CRISPR-Cas9 system has revolutionized genome engineering, yet its full therapeutic potential remains constrained by challenges in precisely modulating its activity and specificity. Here we report a fully computational end-to-end pipeline for the de novo design of a single-domain VHH nanobody (NbSpCas9-v1) targeting a structurally conserved, non-catalytic epitope at the PAM-interacting (PI) and RuvC-III interface of Streptococcus pyogenes Cas9 (SpCas9; PDB: 4UN3). Nanobody sequences were generated using BoltzGen, a generative diffusion binder design framework, and co-folded with SpCas9 using Boltz-2 to evaluate structural confidence and binding affinity. The top-ranked model (SpCas9_4UN3_Bivalent_Hub_v1) achieved a complex pLDDT of 0.8406, an aggregate score of 0.8016, and an ipTM of >0.8, indicating high confidence in the nanobody-antigen interface. The designed 1,616-residue quaternary complex (SpCas9 + sgRNA + DNA + nanobody) was subjected to 10 ns of all-atom molecular dynamics (MD) simulation using the AMBER14SB force field within the GROMACS/OpenMM framework. The complex stabilized at RMSD [~]6 [A] with a radius of gyration of 39-44 [A], confirming thermodynamic stability under physiological conditions (310 K, 0.15 M NaCl). A conserved 96.3 [A] inter-molecular distance between the nanobody centroid and the HNH catalytic residue H840 establishes NbSpCas9-v1 as a distal, non-inhibitory binder -- ideally suited for a Bivalent Hub architecture recruiting secondary effectors to the Cas9 ribonucleoprotein (RNP). The nanobody-Cas9 interface is stabilized by 8 hydrogen bonds, 4 salt bridges, and [~]1,850 [A]2 of buried solvent-accessible surface area. These results provide a rigorous structural and dynamic foundation for experimental validation of VHH-based CRISPR-Cas9 enhancers and modulators. GRAPHICAL ABSTRACTThe computational workflow proceeds from SpCas9 crystal structure acquisition (PDB: 4UN3) through BoltzGen nanobody design, Boltz-2 structural co-folding, 10 ns explicit-solvent MD validation, and Bivalent Hub functional characterization. The PyMOL rendering below shows the full quaternary complex at atomistic resolution.

2

Molecular basis of Salla Disease: R39C Mutation Effects on the Lysosomal Transporter Sialin

Matsingos, C.; Lot, I.; Vaz, M.; Mailliart, J.; Boulayat, M.; Debacker, C.; Goupil-Lamy, A.; Gasnier, B.; Acher, F. C.; Anne, C.

2026-04-22 biochemistry 10.64898/2026.04.20.719580 medRxiv

Top 0.1%

23.9%

Show abstract

Salla disease is caused by a genetic mutation in sialin, a lysosomal membrane transporter, which exports sialic acid from lysosomes. Substrate translocation occurs via a rocker-switch mechanism that alternately exposes the substrate-binding site to the lysosomal lumen and the cytosol. The pathogenic mutation R39C found in most Salla disease patients decreases the lysosomal localisation and the transport activity. In this study, we used computational and mutagenesis approaches to elucidate the molecular effects of the R39C mutation. Using three-dimensional models of human sialin in the lumen-open (LO) and cytosol-open (CO) states combined with the mutagenesis of selected residues, we identify a critical "triplet" motif comprising R39, E194, and E262, which is associated with an ionic lock formed between K197 and D350 in the LO conformation. Molecular dynamics simulations suggest that the electrostatic triplet negatively modulates the ionic lock, and are consistent with a strengthened ionic lock in R39C sialin, potentially favouring the LO state. To assess the global effects of the R39C mutation, we computed dynamic cross-correlation matrices and identified correlation patterns consistent with an allosteric coupling between the ionic lock K197/D350 and the region surrounding the sialic acid binding site in wild-type sialin, whereas in the LO state of R39C sialin, this communication preferentially bypasses this region. Therefore, the R39C mutation may impede the LO to CO conformational transition required for sialic acid transport, providing a plausible mechanistic framework for the decreased transport activity, and possibly the decreased lysosomal localisation, observed in Salla disease. HighlightsO_LIThe R39 residue participates in an interaction triplet, which negatively regulates an ionic lock stabilising the lumen-open conformation C_LIO_LIThe R39C mutation is associated with a stronger ionic lock in the simulations, and may favour the lumen-open state C_LIO_LICorrelation network analysis suggests an allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LIO_LIThe R39C mutation alters the inferred allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/719580v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@1bf7144org.highwire.dtl.DTLVardef@1a53ab8org.highwire.dtl.DTLVardef@b2249forg.highwire.dtl.DTLVardef@1827244_HPS_FORMAT_FIGEXP M_FIG C_FIG

3

Smart AI-Powered Machine Learning Risk Assessment for Early Osteoporosis Detection for Women Bone Health

Monfared, V.

2026-06-02 orthopedics 10.64898/2026.05.31.26354550 medRxiv

Top 0.1%

19.6%

Show abstract

Osteoporosis is often called a silent disease because it progresses without symptoms until a fracture occurs, posing a serious, yet frequently overlooked, threat to women health. In response to the pressing need for early detection, we introduce OsteoInsight, an intelligent, AI-powered web application designed to assess osteoporosis risk with both clinical accuracy and interpretability. Built on a Random Forest classifier trained on over 2000 women health records, our model incorporates a wide range of domain-informed features, including hormonal history, lifestyle factors, reproductive health, and conditions affecting bone health. Despite an imbalanced dataset, with around 75% of cases being osteoporosis-positive, the model achieved encouraging results: 71.81% accuracy, an F1-score of 0.79, and an AUC-ROC of 0.78. SHAP analysis highlighted age, BMI, and menstrual history as key predictors, offering transparent insights into the model reasoning. Additional contributors like fracture history, signs of low estrogen, and lactation duration were also found to be significant, enriching the interpretability of predictions. These insights are seamlessly integrated into OsteoInsight user interface, making risk assessments not only accessible but also understandable for both clinicians and users. Our findings underscore the potential of AI-driven tools to enhance early screening and enable personalized risk profiling, empowering women and healthcare providers to take proactive steps in osteoporosis prevention.

4

Introducing non-enzymatic crosslinks into atomistic simulations of collagen fibrils

Giannetti, G.; Pils, J.; Graeter, F.; Monego, D.; Dellago, C.

2026-03-16 bioinformatics 10.64898/2026.03.13.711566 medRxiv

Top 0.1%

18.7%

Show abstract

MotivationCollagen fibrils are the primary load-bearing units of connective tissues. However, generating atomistic, simulation-ready models remains challenging due to collagens hierarchical organization and the diversity of its crosslinking network across tissues, ages, and metabolic states. Notably, non-enzymatic advanced glycation end-product (AGE) crosslinks--central to aging and diabetic complications--are largely absent from current atomistic fibril modelling workflows. ResultsHere, we present an extension of the ColBuilder framework to generate atomistic collagen fibril models that incorporate three representative AGE-derived crosslinks (glucosepane, pentosidine, and MOLD) alongside enzymatic crosslinks. Amber99-compatible parameters are provided and assessed against QM-optimized reference geometries using all-atom molecular dynamics (MD) simulations. As proof-of-concept, we examine the mechanical response of single D-period collagen microfibrils featuring enzymatic-only, AGE-only, and mixed crosslink patterns in Molecular Dynamics simulations under force, and observe that AGE crosslinks differently impact the fibril structure compared to enzymatic crosslinks. The extension to ColBuilder can aid future structure-based research on collagen aging. Availability and implementationColBuilder is available as an open-source Python command-line package at https://github.com/graeter-group/colbuilder.

5

Interplay of the ribosome A and CAR sites

Raval, M.; Zhou, Y.; Lynch, M.; Krizanc, D.; Thayer, K.; Weir, M. P.

2026-04-09 systems biology 10.64898/2026.04.07.714784 medRxiv

Top 0.1%

18.3%

Show abstract

Protein translation is a highly regulated process influenced by multiple factors at the initiation, elongation, and termination stages. One notable regulatory element of the ribosome is the CAR interaction surface, a three-residue motif in the structure of the ribosome composed of C1274 and A1427 of S. cerevisiae 18S rRNA (corresponding to C1054 and A1196 in E. coli 16S rRNA) and R146 of ribosomal protein Rps3. CAR is highly conserved and positioned adjacent to the amino-acyl (A site) decoding center. It establishes hydrogen bonds with the +1 codon next in line to enter the ribosome A site, acting as an extension of the tRNA anticodon and forming base-stacking interactions with nucleotide 34 of the tRNA. However, despite CARs enzymatically strategic positioning within the ribosome, its functional relationship with the A site remains poorly characterized. Using molecular dynamics (MD) simulations, we examined the interplay between the A site and CAR site, revealing sequence-dependent modulation of H-bonding and {pi}-stacking interactions within and between the two sites. These findings highlight the interplay between the A site and CAR site, suggesting a structural and functional connection between these two regions of the ribosome that may contribute to mRNA sequence-specific tuning of translation elongation. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=91 SRC="FIGDIR/small/714784v1_ufig1.gif" ALT="Figure 1"> View larger version (22K): org.highwire.dtl.DTLVardef@1919efaorg.highwire.dtl.DTLVardef@15c4882org.highwire.dtl.DTLVardef@19c7782org.highwire.dtl.DTLVardef@16a1246_HPS_FORMAT_FIGEXP M_FIG C_FIG

6

PDBe-SIFTS: an open-source tool for Structure Integration with Function, Taxonomy, and Sequences, featuring improved alignment, scoring scheme, and accelerated search

Bellaiche, A.; Choudhary, P.; Nair, S.; Harrus, D.; Yu, C. W.-H.; Tanweer, S. A.; Evans, G. L.; Lo, S. W.; Martin, M.; Fleming, J. R.; Velankar, S.

2026-05-04 bioinformatics 10.64898/2026.04.30.721839 medRxiv

Top 0.1%

14.9%

Show abstract

Structure Integration with Function, Taxonomy and Sequences (SIFTS) provides residue-level mappings between UniProt Knowledgebase sequences and Protein Data Bank structures and has historically been generated through internal Protein Data Bank in Europe (PDBe) pipelines. Here, PDBe-SIFTS is presented as a fully open-source, locally deployable implementation of this mapping framework. The pipeline combines fast, scalable sequence search using MMseqs2, an improved bounded scoring scheme for ranking candidate mappings, and residue-level mapping refinement based on backbone connectivity. PDBe-SIFTS is distributed as a Python package with command-line tools for 1) building a sequence search database, 2) identifying the best sequence-structure match, 3) one-to-one mapping at the residue level, and 4) generating SIFTS annotations in PDBx/mmCIF format. Benchmarking on the complete Protein Data Bank archive showed that MMseqs2 reduced archive-scale UniProtKB searches from hours with BLASTP to minutes, approximately 22-36 times faster, while curated mappings were recovered at top rank in 93.1% of cases. The remaining discrepancies mainly involved biologically ambiguous cases such as highly conserved proteins, chimeric constructs, or closely related orthologs. These results show that PDBe-SIFTS enables fast mapping, improving structural coherence in residue-level alignments while delivering the most up-to-date and accurate mappings, comparable to expert curation. Tool: https://github.com/PDBeurope/SIFTS Quick start notebook with example: https://github.com/PDBeurope/SIFTS/tree/master/notebooks Broader audience statementMatching protein sequences to their three-dimensional structures, and mapping annotations across both, is essential for understanding protein function, interactions, and molecular mechanisms. This integrated view enables richer interpretation of biological data and underpins advances in drug discovery, disease research, and protein engineering. PDBe-SIFTS provides an open and functional framework for structure-sequence mapping, allowing researchers and databases to run, inspect, and extend these mappings locally, while benefiting from faster searches, transparent scoring, and structurally informed residue-level alignments. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/721839v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@5e6ea6org.highwire.dtl.DTLVardef@1b2754dorg.highwire.dtl.DTLVardef@1334f9forg.highwire.dtl.DTLVardef@1b083a1_HPS_FORMAT_FIGEXP M_FIG C_FIG

7

Bridging LLM Reasoning and Chemical Knowledge via an Evolutionary Multi-Agent Framework for Molecular Synthesis

Chen, Y.; Rao, J.; Xie, J.; Sun, Y.; Yang, Y.

2026-05-06 bioinformatics 10.64898/2026.05.02.722342 medRxiv

Top 0.1%

14.6%

Show abstract

MotivationMolecular design faces the dual challenge of navigating a vast chemical space while ensuring experimental synthesizability. Traditional models are constrained by small datasets, restricting their scalability and broader chemical context. In contrast, Large Language Models (LLMs) encapsulate extensive synthesis protocols derived from vast scientific literature, yet they struggle to leverage this potential due to severe hallucinations and a superficial grasp of rigorous chemical logic. ResultsWe propose EvoSyn, an evolutionary multi-agent framework that synergizes LLM reasoning with domain experts for preference-aware molecular synthesis. EvoSyn orchestrates a dual-process evolutionary paradigm: a co-evolving process that collaboratively aligns linguistic capabilities with multi-objective constraints, and a self-evolving process formulated as a Markov Game. Through evolution and reinforcement learning, agents actively learn from mistakes, utilizing domain feedback to penalize invalid proposals and ground generation in feasible reaction pathways. Extensive evaluations on comprehensive benchmarks demonstrate that EvoSyn significantly outperforms state-of-the-art baselines. These results highlight that by integrating LLM-guided self-evolution with rigorous domain validation to mitigate hallucinations, EvoSyn effectively yields molecules that are both bioactive and synthetically actionable. Availability and implementationImplementation code is available as supplementary material. Contactyangyd25@mail.sysu.edu.cn Supplementary informationSupplementary data are available at Bioinformatics online.

8

A novel SXXLF motif in the FXR N-terminal domain mediates coregulator and interdomain interactions

Villalona, P.; Pulahinge, T.; Yu, T.; Wenning, J.; Frisbie, C. J.; Magafas, J.; Okafor, C. D.

2026-05-20 biochemistry 10.64898/2026.05.18.724725 medRxiv

Top 0.1%

14.2%

Show abstract

The nuclear receptor superfamily is comprised of ligand-regulated transcription factors that contain an intrinsically disordered domain at the amino-terminal end, known as the N-terminal domain (NTD). While this poorly conserved domain is known to possess ligand-independent activation function (AF-1), few NTD functions are conserved between nuclear receptors (NRs). Identified roles in other receptors include androgen receptor (AR), estrogen receptor (ER) and mineralocorticoid receptor (MR). Here, we aim to define the function of the NTD of the farnesoid X receptor (FXR), a crucial regulator of lipid and bile acid metabolism. We show that the NTD engages in interdomain contact with other FXR domains. We also observe that the NTD interacts directly with coregulator proteins. Using mutagenesis, mammalian two-hybrid assays and molecular dynamics simulations, we identify and validate a novel SXXLF motif in the NTD which mediates interactions with both coregulators and the ligand binding domain. Mutation of the motif induces large changes in conformational and allosteric coupling in FXR. Our study identifies a new nuclear receptor-interacting motif that modulates the transcriptional activity of FXR. Graphical AbstractFXR-NTD regulates transcriptional activity through interdomain communication with the LBD and is also involved in co-activator recruitment. The SENLF motif is the first defined functional element within the FXR-NTD and mediates both NTD-LBD interaction and selective co-activator engagements to drive NTD-mediated transcriptional activity. O_FIG O_LINKSMALLFIG WIDTH=135 HEIGHT=200 SRC="FIGDIR/small/724725v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@5a37aorg.highwire.dtl.DTLVardef@2fa9e1org.highwire.dtl.DTLVardef@13a19daorg.highwire.dtl.DTLVardef@1775ed2_HPS_FORMAT_FIGEXP M_FIG C_FIG

9

PEPR-GNN: Perturbation-Enhancer-Promoter-RNA Graph Neural Networks for Multiome Perturb-Seq modeling of regulomes

Markham, Z. E.; Li, B.; Nguyen, L.; Wang, L.; Munshi, N. V.; Hon, G. C.

2026-05-06 genomics 10.64898/2026.05.05.722311 medRxiv

Top 0.1%

14.2%

Show abstract

Cellular reprogramming is a complex interplay between perturbations and regulatory elements, culminating in gene expression changes. Current computational approaches do not explicitly model these regulatory interactions. Here, we performed combinatorial reprogramming with cardiac transcription factors, followed by Multiome Perturb-Seq to measure perturbations, open chromatin, and gene expression in individual cells. We then developed PEPR-GNN (Perturbation-Enhancer-Promoter-RNA Graph Neural Network), a theoretical and computational framework to model regulome responses during complex genetic perturbations. By statistically associating gene regulatory relationships, PEPR-GNN organizes genes into regulomes with shared gene regulatory responses to reprogramming, including easy-to-reprogram cardiac genes, difficult-to-reprogram fibroblast genes, and context-specific genes where the impact of a reprogramming factor depends on the presence of others. Finally, we use PEPR-GNN for in silico modeling of how genetic modifications of enhancers can be used to tune gene responses to reprogramming. Overall, through the use of causal perturbation information and an enhancer-aware regulome model of gene regulation, PEPR-GNN can effectively model complex cellular responses to perturbation. HighlightsO_LIMultiome Perturb-Seq of GHMT reprogramming in MEFs with RNA/ATAC-Seq readout. C_LIO_LIPEPR-GNN: a computational framework to model perturbation-induced regulomes. C_LIO_LIPEPR-GNN aids the interpretation of regulomes by diverse reprogramming responses. C_LIO_LIPEPR-GNN enables in silico perturbation to tune gene responses to reprogramming. C_LI

10

Learning from Drops: AI-Guided Integration of Liquid Biopsy Features in Cancer Studies

Andueza, M.; Villoslada-Blanco, P.; De Dreuille, B.; Alonso, L.; Sabroso-Lasa, S.; Pantel, K.; Alix-Panabieres, C.; Lopez de Maturana, E.; Malats, N.

2026-05-17 bioinformatics 10.64898/2026.05.12.724535 medRxiv

Top 0.1%

14.2%

Show abstract

Cancer is a major global health issue with rising incidence and mortality. Early detection, tumor characterization, and disease surveillance are crucial for timely and effective treatment, ultimately reducing mortality rates. Liquid biopsy (LB) has emerged as a valuable detection tool offering a non-invasive method to determine tumor-derived biomarkers in body fluids with demonstrated translational potential. To increase biomarker sensitivity, high-throughput sequencing platforms deliver massive volumes of data. Artificial Intelligence (AI) is pivotal in enabling huge and complex data integration. This contribution aims to assess the current state of integrative AI-based research in the LB field and provide methodological guidance. First, we conducted a PubMed search and found that the literature is sparse in studies integrating LB features, particularly by applying AI. When adopting the latter approach, defining the study objectives is crucial to guide the subsequent methodological aspects, including study design, patient selection criteria, sample size, nature of the LB features, and metadata to collect. Specifically, we propose strategies and tools for data preprocessing, including normalization and batch correction, as well as handling outliers and missing data. Furthermore, we recommend various Machine/Deep Learning approaches for feature selection techniques to ensure model robustness, and we highlight the importance of undergoing rigorous internal and external validations of the selected models. Assessing clinical utility and interpretability is often overlooked but fundamental for real-world implementation. In conclusion, we provide the LB scientific community with an AI-based methodological guidance to bridge the two fields and enhance the integrative analysis of LB features. Graphical abstractWorkchart for multiomics integrative studies in the liquid biopsy field. Note: CTCs, circulating tumor cells; ctDNA, circulating tumor-DNA; TEPs, tumor-educated platelets; miRNA, microRNA; cfRNAs, cell-free RNAs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=159 SRC="FIGDIR/small/724535v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@1f250b2org.highwire.dtl.DTLVardef@18fe36corg.highwire.dtl.DTLVardef@19c02b9org.highwire.dtl.DTLVardef@176f6e0_HPS_FORMAT_FIGEXP M_FIG C_FIG

11

RT-nested and interfering-Primer PCR reveal prevalent isoform-specific A-to-I RNA editing in neuronal genes

Wang, Z.; Ni, Y.; Cai, W.; Li, H.; Duan, Y.

2026-05-17 molecular biology 10.64898/2026.05.15.725286 medRxiv

Top 0.1%

12.9%

Show abstract

BackgroundMetazoan adenosine-to-inosine (A-to-I) mRNA editing temporospatially diversifies the neuronal transcriptome and proteome. The limited read length from next-generation sequencing (NGS) constrains the quantification of the potentially differential editing levels across different splicing isoforms, restricting our understanding of the extent to which RNA editing contributes to molecular diversity and its interplay with splicing. MethodsWe employed reverse transcription nested PCR (RT-nPCR) and developed a novel interfering-Primer PCR (iPrimer PCR) technique to distinguish different transcripts of any gene. We selected multiple essential genes exhibiting RNA editing in coding sequences (CDSs) or untranslated regions (UTRs) for isoform-specific amplification and Sanger sequencing. ResultsNine different Adar isoforms together with pre-mRNA had distinct editing levels at the S>G auto-recoding site, which was predicted to have isoform-specific effects on catalytic activities. Although pre-mRNA editing might exert isoform-dependent promotion/suppression of splicing, closely located editing sites, such as those in neuronal genes qvr and stj, still exhibited high correlation in editing levels due to co-editing. iPrimer strategy further discovered differential recoding levels between the long/short 3UTR isoforms of gene jef. ConclusionsWe provide the first comprehensive solution for isoform-specific PCR amplification of any gene, enabling quantification of RNA editing level of different isoforms. Our results offer insights into how RNA editing interplays with splicing, and highlight its complicated role in expanding molecular diversity. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=79 SRC="FIGDIR/small/725286v1_ufig1.gif" ALT="Figure 1"> View larger version (17K): org.highwire.dtl.DTLVardef@1ebc82org.highwire.dtl.DTLVardef@1ea365dorg.highwire.dtl.DTLVardef@1971aceorg.highwire.dtl.DTLVardef@160d053_HPS_FORMAT_FIGEXP M_FIG C_FIG We developed isoform-specific PCR followed by Sanger sequencing, and achieved the quantification of differential RNA editing levels in different transcripts of a gene.

12

Rapid and dynamic reprogramming within the tumor microenvironment drives EDA-CAR-T dysfunction and compromised therapeutic efficacy in solid tumors

Redondo-Frutos, R.; Justicia-Lirio, P.; Cervantes-Calleja, M. E.; San Martin-Uriz, P.; Aguirre-Ruiz, P.; Jordana-Urriza, L.; Garnica-Suberviola, M.; Camara-Pena, S.; Alignani, D.; Lopez, A.; Rodriguez-Diaz, S.; Martinez-Turrillas, R.; Gorraiz, M.; Bakirdogen, D.; Pocaterra, A.; Inoges, S.; Lopez-Diaz de Cerio, A.; Algul, H.; Mondino, A.; Hernaez, M.; Lasarte, J. J.; Prosper, F.; Lozano, T.; Rodriguez-Madoz, J. R.

2026-05-03 genomics 10.64898/2026.04.29.721801 medRxiv

Top 0.1%

12.8%

Show abstract

BackgroundChimeric antigen receptor (CAR)-T cell therapies efficacy in solid tumors remains limited, largely due to the profoundly immunosuppressive tumor microenvironment (TME) which drives CAR-T cells to dysfunction and poor persistence. A comprehensive understanding of the dynamic interplay between CAR-T cells and the TME is therefore critical for the rational design of more effective CAR-T strategies for solid cancers. MethodsHere, we performed single-cell RNA sequencing of tumor samples from immunocompetent mice treated with stroma-targeting EDA-CAR-T cells, profiling CAR-T cell states and TME programs at the peak of antitumor response and during subsequent tumor progression. ResultsOur analysis revealed a marked temporal remodeling of EDA-CAR-T cells within the TME, where early antitumor efficacy is associated with concurrent expansion of cytotoxic effector CD8 CAR-T cells and activation of memory CD4 CAR-T subsets. Moreover, EDA-CAR-T cells effectively engaged the myeloid compartment, resulting in strengthened communication networks involving T cell activation. However, by tumor progression, EDA-CAR-T cells suffered a widespread transcriptional reprogramming towards dysfunction, characterized by loss of effector programs alongside induction of exhaustion and immunoregulatory pathways within the TME, including PD-L1/PD-L2 and TGF{beta} signaling, which impairs sustained immune responses. Notably, early CAR-T cell activation led to increased susceptibility to TME-mediated immunosuppression, revealing EDA-CAR-T-specific soluble galectin-mediated cell-to-cell interaction networks. ConclusionsTogether, this works offers a high-resolution view of CAR-T cell dynamics within the solid TME, uncovering cellular and molecular mechanisms of rapid functional decline and identifying regulatory pathways within the TME that can be exploited to improve CAR-T cell therapy efficacy in solid tumors. KEY MESSAGES OF THE ARTICLEO_ST_ABSWhat is already known on this topicC_ST_ABSThe determinants of CAR-T cell therapeutic efficacy in solid tumors remain poorly defined, largely due to the complexity of the immunosuppressive tumor microenvironment. In this effort, it is necessary to perform comprehensive and detailed mechanistic studies that capture CAR-T cell dynamics within the solid tumor microenvironment to understand treatment failure. What this study addsWe performed single-cell profiling of stroma-targeting EDA-CAR-T cells, revealing their dynamic reprogramming toward dysfunction within the solid tumor microenvironment. We dissected CAR-T cell states and their cell-to-cell interactions with the tumor microenvironment across response and tumor progression and identified mechanisms linking CAR-T cell functionality and therapeutic failure. How this study might affect research, practice or policyThis study provides comprehensive mechanistic insights from an immunocompetent model that can be leveraged to identify shared determinants of CAR-T cell functionality in solid tumors and potentially guide the rational development of improved CAR-T cell therapies.

13

Expanding the options for therapeutic exon skipping as a future treatment for USH2A-associated disease by 3D structural modeling of newly formed hybrid domains

Malinar, L.; Broekman, S.; Rademaker, D. T.; Le, A. Q.; Peters, T.; de Vrieze, E.; 't Hoen, P. A. C.; van Wijk, E.; Venselaar, H.

2026-04-28 bioinformatics 10.64898/2026.04.24.720583 medRxiv

Top 0.1%

12.8%

Show abstract

Usher syndrome, the leading cause of hereditary deaf-blindness affecting approximately 1 in 15,000 individuals worldwide, is currently still untreatable. Antisense oligonucleotide-based exon skipping has shown significant therapeutic promise for USH2A-associated retinal dysfunction. Selection of (combinations of) exons suitable for therapeutic exon skipping within the fibronectin type 3 (FN3) domain-encoding region of USH2A currently requires that skipped exons exactly align with complete protein domains. However, only few exon combinations meet this criterion, which significantly restricts the therapeutic potential of this strategy. Our study addresses this limitation by incorporating AlphaFold2 structural modelling into the exon skipping target selection pipeline. Following this adjusted framework, we can predict exon skipping combinations that allow remaining domain fragments to form structurally viable hybrid domains. As a proof-of-concept, we examined and confirmed the functionality of usherin{Delta}exon54-58 that contains a hybrid FN3 domain, using zebrafish as a model. This highligts the potential of the newly developed paradigm for identifying exon skipping targets with potential therapeutic relevance. Our results emphasize the value of structural modeling in identifying new therapeutic exon skipping targets, aiming to improve precision, efficiency, applicability, and cost-effectiveness in the development of genetic therapies for hereditary diseases such as Usher syndrome.

14

An Expert-Informed Synthetic Animal Data Generator: A Physiology-Consistent Generative Framework for High-Fidelity Animal Digital Twins

Youssef, A.; Sun, C.; Norton, T.

2026-04-27 bioengineering 10.64898/2026.04.23.720335 medRxiv

Top 0.1%

12.8%

Show abstract

Digital twins are increasingly recognized as a transformative technology for precision livestock farming; however, a major bottleneck in their development remains the scarcity of high-quality, high-granularity physiological data. This study introduces the expert-informed conditional diffusion (EICD) framework, a novel approach to synthesizing high-fidelity metabolic time-series trajectories by embedding mechanistic biological principles directly into the generative process. While traditional generative models often prioritize statistical pattern-matching over biological reality, frequently resulting in physiological hallucinations, the EICD framework utilizes a physiology loss function (PhLF) to act as a form of mechanistic regularization. This guardrail penalizes samples that contradict expert-defined constraints, such as the laws of porcine bioenergetics, effectively steering the model toward a realistic physiological manifold. The framework was validated using an empirical dataset of growing pigs under varying thermal conditions. Quantitative results demonstrate near-perfect statistical distributional fidelity, with the model achieving an average Jensen-Shannon divergence (JSD) of 0.062 and a Kullback-Leibler divergence (KLD) of 0.19. The full EICD model produced a mean energy expenditure (EE) of 284.94 {+/-} 38.70 kJ/kg/day, mirroring the empirical average of 281.33 {+/-} 41.58 kJ/kg/day. In contrast, the standard generative diffusion model (i.e., with no physiology guardrail) exhibited significant distributional drift, yielding a mean EE of 334.41 kJ/kg/day. The biological integrity of the model was further assessed using the biological violation rate (BVR), a novel metric defined as the percentage of generated samples that fall outside the physically possible metabolic boundaries established by species-specific laws. While the standard diffusion model produced frequent biological artifacts, the EICD framework successfully suppressed these hallucinations, ensuring that synthetic trajectories remain strictly grounded in mechanistic laws. Despite these advancements, limitations remain at physiological extremes where individual stochasticity is high. By providing a reliable method for generating physiology-consistent synthetic data, this framework provides a robust foundation for the next generation of animal digital twins. HighlightsO_LIA novel expert-informed conditional diffusion (EICD) framework is proposed for physiology-consistent synthetic data generation in precision livestock farming. C_LIO_LIA physiology loss function (PhLF) embeds species-specific bioenergetic laws directly into the generative process as a mechanistic guardrail. C_LIO_LIThe framework achieves near-perfect distributional fidelity (JSD = 0.062) while suppressing physiological hallucinations (BVR = 0.93%). C_LIO_LIAn ablation study confirms that biological consistency is not an emergent property of standard diffusion models but requires explicit mechanistic constraints. C_LIO_LIThe framework provides a scalable solution for synthetic data augmentation in precision livestock farming, supporting the 3Rs and enabling high-throughput in silico experimentation. C_LI

15

SynCom101: A web-based platform for the standardized design of functionally tailored synthetic microbial communities

Jing, J.; Rockx, S.; Liu, A.; Melkonian, C.; Raaijmakers, J. M.; Garbeva, P.; Medema, M. H.

2026-04-27 bioinformatics 10.64898/2026.04.23.720341 medRxiv

Top 0.1%

12.7%

Show abstract

BackgroundSynthetic microbial communities (SynComs) are essential tools for dissecting the causal mechanisms in host-microbiota interactions. To date, however, SynCom design suffers from a lack of standardization, typically oscillating between arbitrary strain selection and computational pipelines that misalign with experimental design. As microbiome research transitions toward functionally defined community systems with reproducible experimental outcomes, there is a strong need for a user-friendly platform that integrates multi-dimensional genomic and/or biological data into a standardized and tailored SynComs design. ResultsHere, we present SynCom101, a web-based platform that democratizes the design of reproducible, hypothesis-driven SynComs. SynCom101 accommodates diverse input formats including genomic annotations and laboratory-obtained phenotypic traits, allowing users to customize their design criteria with high flexibility. The platform utilizes a parsimony algorithm to ensure computational scalability for large datasets, complemented by an optional correlation-aware mode to account for microbial compatibility and co-occurrence patterns when ecological interactions among strains are available. A core innovation of SynCom101 is its suite of trait-weighting modules, which empowers researchers to strategically guide the selection algorithm toward maximal functional trait coverage, the emulation of natural community architectures, or the enrichment of positively correlated microbial assemblages to enhance community stability. We showcase the functionalities of the platform by in silico design of communities from different datasets, demonstrating its capacity to generate concise, functionally prioritized SynComs aligned with targeted design objectives. ConclusionBy providing a transparent, parameter-documented workflow, SynCom101 ensures that community design is no longer a "black box" but a reproducible scientific record. This platform establishes a necessary standard for in silico community assembly, facilitating the transition from descriptive microbiome studies toward high-throughput, predictive functional screening and cross-study comparability. AvailabilitySynCom101 can be accessed via the web interface (https://syncom101.bioinformatics.nl/). The datasets used for case studies are available on Zenodo (https://doi.org/10.5281/zenodo.18310451). The source code is available at Git (https://git.wur.nl/jiayi.jing/syncom101).

16

Transcriptional regulators predicted to drive macrophage dysregulation during impaired wound healing in diabetic mice

Lukas, B. E.; Pang, J.; Dai, Y.; Koh, T. J.

2026-04-24 immunology 10.64898/2026.04.21.719960 medRxiv

Top 0.1%

12.6%

Show abstract

Dysregulation of Mo/M{varphi} activity is known to contribute to impaired healing in diabetes; however, the mechanisms underlying this dysregulation are not well understood. In this study, we used a variety of bioinformatics approaches along with our time series scRNA-seq data on wound Mo/M{varphi} from non-diabetic and diabetic mice to identify transcriptional regulators (TRs) that drive Mo/M{varphi} state transitions during normal and impaired healing. First, we used the Lamian framework and our newly developed Pseudotime Graph Diffusion method to show that state transitions from early stage phenotypes to later stage reparative and antigen presenting phenotypes characteristic of normally healing wounds are impaired and that transitions to inflammatory, foam cell-like, and Lyve-1+ M{varphi} phenotypes are enhanced during impaired healing of diabetic mice. Using our BITFAM model, we identified a broad range of TRs predicted to be preferentially active in each cell state and using CellOracle, we performed in silico perturbation to identify groups of TRs predicted to drive cell state transitions along multiple trajectories (e.g. CEBPA, IRF8), whereas other TRs were predicted to drive cell state transition towards reparative phenotypes (e.g. NR1H3, NR3C1) or towards an antigen-presenting phenotype (e.g. IRF4, OGT). Selected findings were validated using existing experimental data, confirming the usefulness of this approach. In conclusion, we identified TRs that likely drive Mo/M{varphi} state transitions towards desirable and undesirable phenotypes for wound healing. These findings provide insight into novel targets for altering Mo/M{varphi} phenotypes to promote healing of diabetic wounds.

17

Impact of the N-glycosylation on full-length IgG2 and IgG4 antibodies: a comparative study using molecular dynamics simulations.

LEON FOUN LIN, R.; Bellaiche, A.; Diharce, J.; Etchebest, C.

2026-04-17 bioinformatics 10.64898/2026.04.14.718417 medRxiv

Top 0.1%

12.4%

Show abstract

Like other proteins, monoclonal antibodies - important biodrugs- are subject to post translational modifications, especially the N-glycosylations. However, the effect of the N-glycosylations remains poorly studied and atomistic details about their influence are rarely available. Moreover, the few existing studies focus on the prevalent immunoglobulin G1. To go further in the understanding of the impact of glycosylations, we have carried out a comparative exploration of the effect of N-glycosylations on two different classes of antibodies, namely Mab231, an IgG2 and the pembrolizumab, an IgG4. The two antibodies differ by their sequences, their length, their 3D structure but also by the location and composition of the glycans. In the present work, detailed and important information were gained through molecular dynamics simulations where both monoclonal antibodies were studied without and with the presence of their glycans. The results of 1.5 {micro}s of sampling for each system show that glycosylation does not drastically alter the overall conformational landscape of either antibody, whatever the metrics considered. However, it measurably modulates local flexibility, inter-domain correlated motions, and the relative orientation of the Fab arms with respect to the Fc domain, with statistically significant shifts in key geometric descriptors. Importantly, contact analysis reveals that glycan interactions extend beyond the Fc region to reach Fab residues. The allosteric network calculations demonstrate that the influence of Fc-bound glycans propagates even until the Fab framework regions in both mAbs, which could impact the antigen binding. The nature and magnitude of these effects are subclass-dependent, reflecting differences in glycan composition, hinge architecture, and three-dimensional organization Our findings challenge the prevailing view that Fc glycosylation uniformly promotes CH2 domain opening. More importantly, it underscores the necessity of considering full-length structures and IgG subclass diversity in glyco-engineering strategies.

18

An AI-Assisted Workflow for Reconstruction, Extension, and Calibration of Quantitative Systems Pharmacology Models.

Goryanin, I.; Checkley, S.; Demin, O.; Goryanin, I.

2026-04-07 systems biology 10.64898/2026.04.05.716273 medRxiv

Top 0.1%

12.4%

Show abstract

AbstractsO_ST_ABSBackgroundC_ST_ABSQuantitative systems pharmacology (QSP) models provide mechanistic insight into drug response but are limited by labor-intensive, expert-driven workflows. We developed an AI-assisted QSP (AI-QSP) framework that integrates large language models (LLMs) with SBML-based modeling to enable automated reconstruction, extension, and calibration of mechanistic models. MethodsThe framework was applied to a published CAR-T QSP model. The model was reconstructed in SBML and extended via LLM-guided prompts to incorporate key resistance mechanisms: T-cell exhaustion, PD-1/PD-L1 checkpoint regulation, and tumor antigen escape. Model development followed an iterative expert-in-the-loop workflow. The resulting model (21 reactions, 9 species) was calibrated to synthetic benchmark data using 19-parameter optimization. Model credibility was assessed using ASME V&V 40 and ICH M15 principles, including global sensitivity and profile-likelihood analyses. ResultsThe calibrated model reproduced benchmark dynamics with high accuracy (mean log-RMSE = 0.132). Sensitivity analysis identified CAR-T killing and bystander cytotoxicity as dominant drivers of tumor response. Profile-likelihood analysis showed 71% of parameters were practically identifiable, with remaining parameters prioritised for future data-driven refinement. ConclusionsAI-assisted QSP modeling enables reproducible, scalable model reconstruction and evolution while maintaining mechanistic transparency and regulatory alignment. This framework provides a foundation for accelerating model-informed drug development in cell and gene therapies.

19

Integrating Metabolic Networks into Hybrid Bioprocess Models

Gotsmy, M.; Guillen-Gosalbez, G.

2026-04-24 bioengineering 10.64898/2026.04.22.720062 medRxiv

Top 0.1%

12.3%

Show abstract

The optimization and control of bioprocesses require robust in silico models that can accurately capture the complex and dynamic behavior of living cells. While hybrid models that combine machine learning with mechanistic equations have emerged as a powerful tools, they often require relatively large datasets and might yield inconsistent predictions that violate the stoichiometry of metabolism. In this study, we introduce FBA-Hyb, a multi-scale hybrid modeling framework that tightly integrates genome-scale metabolic networks via flux balance analysis (FBA) into its architecture. In our FBA-Hyb framework, artificial neural networks predict key FBA inputs (substrate uptake rates and cellular objectives) while a surrogate FBA module translates them into the metabolic fluxes that govern the bioprocess. A key novelty is that the FBA optimization step is replaced by a surrogate generated with symbolic regression, which encapsulates the FBA model into a compact analytical expression. This allows easy backpropagation through the integration of the neural controlled differential equationbased FBA-Hyb bioprocess model. We validated FBA-Hyb against a standard hybrid model (Std-Hyb) using two Escherichia coli fedbatch case studies. In the first study, FBA-Hyb achieved a 42 % average improvement in predictive accuracy (R2) during a leave-one-process-out cross validation. Crucially, FBA-Hyb maintains strict stoichiometric feasibility even during extrapolation. Meanwhile, an alternative approach based on standard architectures leads to stoichiometrically inconsistent solutions in 22 % of the cases analyzed. In the second case study, we demonstrate how FBA-Hyb effectively simulates unmeasured chemical species and discovers a metabolic shift in sulfate-limited regimes during bioprocessing. By providing a modular, biologically consistent, and computationally efficient architecture, FBA-Hyb offers a robust foundation for the next generation of bioprocess models and sustainable process optimization. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=81 SRC="FIGDIR/small/720062v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@16f011eorg.highwire.dtl.DTLVardef@b25b5borg.highwire.dtl.DTLVardef@18bd178org.highwire.dtl.DTLVardef@65274e_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIFBA-Hyb integrates flux balance analysis (FBA) into hybrid bioprocess models. C_LIO_LISymbolic regression discovers a simple closed-form FBA surrogate model. C_LIO_LIThe FBA surrogate ensures accurate reaction stoichiometry. C_LIO_LIA neural network predicting the FBA objective keeps the model flexible. C_LIO_LIFBA-Hyb has superior capabilities and accuracy compared to the current standard. C_LI

20

Non-invasive Transcriptomic Cell Profiling of the Human Endometrium with Generative Deep Learning

Meltsov, A.; Falcon-Perez, J. M.; Matorras, R.; Apostolov, A.; Sola-Leyva, A.; Esteki, M. Z.; Salumets, A.; Aleksejeva-Zagura, E.

2026-05-20 obstetrics and gynecology 10.64898/2026.05.18.26352867 medRxiv

Top 0.1%

12.2%

Show abstract

Background Delineating the cellular origins of extracellular vesicles (EVs) enables the detection of clinically relevant changes in dynamic and complex tissues, such as the endometrium, which are not characterizable through single biomarker assays. Transcriptome deconvolution into cellular composition using deep learning methods provides a means to explore this complexity. However, such computational methods have not been previously applied to EV bulk transcriptomes, and their efficacy in profiling EV population changes and concordance to tissue throughout the menstrual cycle remains unknown. Methods This observational cross-sectional study utilized a deconvolutional generative deep learning algorithm, BulkTrajBlend, trained on a comprehensive human endometrial single-cell RNA sequencing (scRNA-seq) atlas. The model was applied to deconvolve paired bulk transcriptomes from endometrial tissue and uterine fluid EVs (UF-EVs) across the proliferative (P, n=4), early-secretory (ES, n=5), mid-secretory (MS, n=5), and late-secretory (LS, n=5) phases from healthy, fertile women. To validate generalizability, independent UF-EV datasets (ES, n=12; MS, n=12) obtained via different laboratory protocols were included. Deconvolved pseudo-single-cell (pSC) profiles from UF-EV data were subsequently integrated with Visium spatial transcriptomics slides of human endometrium (P, n=2; MS, n=4; ES, n=2). Results We developed a foundation model-based approach utilizing self-supervised learning to determine the cellular origin of EVs from their transcriptomic profiles. By mapping the generated pSC profiles to spatial transcriptomic data, we evaluated spatial origins of EVs. The statistical analysis demonstrated that UF-EV transcriptome deconvolution reflects the dynamic changes in the cellular composition of endometrial tissue across the menstrual cycle phases. The ability to distinguish accurately between proliferative and decidualizing menstrual cycle phases (ROC-AUC = 0.98) using cellular profile of deconvoluted UF-EVs transcriptome enables non-invasive profiling of endometrial tissue. Conclusions Our findings indicate the feasibility of determining endometrial tissue cellular composition using UF-EV transcriptomics. This methodology enables refined, non-invasive endometrial testing, avoiding invasive biopsy procedures. Based on deconvolution results, we are able to correlate UF-EV content to tissue, and distinguish between menstrual cycle phases. These results build toward a multifactorial screening method for abnormalities within the endometrium.